I have used R (not Stata) for tihs assignment for two reasons: (i) I don’t have a Stata license (I am not part of the Master’s programme) and (ii) I prefer to use open-source software whenever possible (David 2012). Please let me know if this is a problem.
Use the Stata dataset called “panamadata.dta” that is provided in the Week 3 subdirectory in Canvas. This contains some of the variables from the Panama Living Standards Measurement Survey, 2008. Questions are as follows.
Data preparation and preliminary exploration
The below section shows the steps taken to read, process, and prepare the data for analysis.
# Attach libraries
library(readstata13) # for reading stata files
library(dplyr) # for dataframe manipulation
library(ggplot2) # for generating charts
library(databrew) # for making chart theme consistent; devtools::install_github('databrew', 'databrew')
library(DT) # for dynamic html tables
library(raster) # for pixel-based GIS
library(sp) # for polygon-based GI
library(rgeos); library(maptools) # more spatial tools
# Read the dta file
df <- read.dta13('panamadata.dta')
# Get a shapefile of Panama
pan <- getData(name = 'GADM', country = 'PAN', level = 1)
# Clean up the shapefile so that it's joinable with df
df$prov <- gsub('comarca ', '', df$prov)
pan@data$prov <- tolower(pan@data$NAME_1)
pan@data$prov[pan@data$prov == 'ngöbe buglé'] <- 'ngöbe bugle'
# Assume that panama oeste is the same as panama
pan@data$prov[pan@data$prov == 'panamá oeste'] <- 'panamá'
# Create an area
pan@data <-
left_join(x = pan@data,
y = df %>% group_by(prov) %>% summarise(area = dplyr::first(area)),
by = 'prov')
# Take a look at our plot
plot(pan)# Make a ggplot compatible version of our plot too
pan_fortified <- broom::tidy(pan, region = 'prov')
pan_fortified <-
left_join(x = pan_fortified %>% mutate(prov = id),
y = df %>% group_by(prov) %>% summarise(area = dplyr::first(area)),
by = 'prov')
# Define function for mapping
mapper <- function(map = pan,
data = x,
tile = 'Stamen.Toner',
palette = 'YlOrRd',
show_legend = TRUE){
map@data <- map@data %>% left_join(data, by = names(x)[1])
require(dplyr)
require(leaflet)
require(RColorBrewer)
bins <- sort(unique(round(c(c(0, quantile(map@data$percent, seq(0, 1, by = 0.1), na.rm = TRUE), max(map@data$percent, na.rm = TRUE) * 1.1)))))
pal <- colorBin(palette, domain = map@data$percent, bins = bins)
map <- map[!is.na(map@data$percent),]
popper <- paste0(map@data$prov, ': ', map@data$percent)
l <- leaflet(data = map) %>%
addProviderTiles(tile)
if(show_legend){
l <- l %>%
addLegend(pal = pal, values = ~percent, opacity = 0.7, title = NULL,
position = "bottomleft")
}
l <- l %>%
addPolygons(fillColor = ~pal(percent),
fillOpacity = 0.8,
color = "#BDBDC3",
weight = 1,
# popup = popper,
highlight = highlightOptions(
weight = 5,
color = "#666",
dashArray = "",
fillOpacity = 0.7,
bringToFront = TRUE),
label = popper,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto"))
return(l)
}
# Calculate poverty gaps
df$poverty_gap <- df$cons_pc / df$povline
df$extreme_poverty_gap <- df$cons_pc / df$xpovline
# Define whether poor or not
df$poor <- df$poverty_gap < 1
# Arrange dataset by poverty
df <- df %>% arrange(poverty_gap) %>%
# Calculate order
mutate(id = 1:nrow(df)) %>%
# Calculate percentile
mutate(p = id / max(id) * 100) %>%
# Calculate cumulative amounts and people
mutate(cum_people = cumsum(hhsize),
cum_consumption = cumsum(consumption))
# Plot the poverty incidence curve
ggplot(data = df, aes(x = cons_pc,
y = cum_consumption /
max(cum_consumption) * 100)) +
geom_area(alpha = 0.6,
fill = 'darkorange') +
geom_line(lty = 2) +
theme_databrew() +
labs(x = 'Consumption per capita',
y = '% of population',
title = 'Poverty incidence curve',
subtitle = 'With first-order stochastic dominance') +
ylim(0, 100)Using the data file on Panama, compute the following: headcount, poverty gap, squared poverty gap…
Produce calculations using the ordinary as well as the extreme poverty lines.
# a) at the national level
sum(df$hhsize[df$poor]) # actual count[1] 11141
sum(df$hhsize[df$poor]) / sum(df$hhsize) * 100 # percent[1] 41.01686
# b) at the area level (rural, urban, indigenous)
x <- df %>%
group_by(area) %>%
summarise(headcount_poor = sum(hhsize[poor]),
population = sum(hhsize)) %>%
mutate(percent = headcount_poor / population * 100)
# Table
x| area | headcount_poor | population | percent |
|---|---|---|---|
| urbana | 2463 | 13499 | 18.24580 |
| rural | 5766 | 10625 | 54.26824 |
| indígena | 2912 | 3038 | 95.85253 |
# Chart
ggplot(data = x,
aes(x = area,
y = percent)) +
geom_bar(stat = 'identity',
fill = 'darkorange',
alpha = 0.6) +
geom_label(aes(label = paste0(round(percent, digits = 2), '%'))) +
theme_databrew() +
labs(x = 'Area',
y = 'Poverty headcount',
title = 'Poverty headcount by area')# Map
mapper(tile = 'Stamen.Toner', data = x)# c) at the provincial level
x <- df %>%
group_by(prov) %>%
summarise(headcount_poor = sum(hhsize[poor]),
population = sum(hhsize)) %>%
mutate(percent = headcount_poor / population * 100)
# Table
x| prov | headcount_poor | population | percent |
|---|---|---|---|
| bocas del toro | 1617 | 2762 | 58.54453 |
| chiriquí | 650 | 2001 | 32.48376 |
| coclé | 885 | 1684 | 52.55344 |
| colón | 405 | 1584 | 25.56818 |
| darién | 961 | 1678 | 57.27056 |
| emberá | 124 | 142 | 87.32394 |
| herrera | 522 | 1533 | 34.05088 |
| kuna yala | 467 | 512 | 91.21094 |
| los santos | 395 | 1262 | 31.29952 |
| ngöbe bugle | 1752 | 1789 | 97.93181 |
| panamá | 2460 | 10556 | 23.30428 |
| veraguas | 903 | 1659 | 54.43038 |
# Chart
ggplot(data = x,
aes(x = prov,
y = percent)) +
geom_bar(stat = 'identity',
fill = 'darkorange',
alpha = 0.6) +
geom_label(aes(label = paste0(round(percent, digits = 2), '%'))) +
theme_databrew() +
labs(x = 'Province',
y = 'Poverty headcount',
title = 'Poverty headcount by province') +
theme(axis.text.x = element_text(angle = 90))# Map
mapper(tile = 'Stamen.Terrain', data = x)# Formula for poverty gap index is (1 / total population) * sum((poverty line - consumption)/ poverty line)
# a) at the national level
weighted.mean(df$poverty_gap, w = df$hhsize)[1] 1.843468
# b) at the area level (rural, urban, indigenous)
x <- df %>%
group_by(area) %>%
summarise(poverty_gap = weighted.mean(poverty_gap, w = hhsize))
# Table
x| area | poverty_gap |
|---|---|
| urbana | 2.6229940 |
| rural | 1.2771785 |
| indígena | 0.3602598 |
# Chart
ggplot(data = x,
aes(x = area,
y = poverty_gap)) +
geom_bar(stat = 'identity',
fill = 'darkorange',
alpha = 0.6) +
geom_label(aes(label = paste0(round(poverty_gap, digits = 2), '%'))) +
theme_databrew() +
labs(x = 'Area',
y = 'Poverty gap',
title = 'Poverty gap index by area')# Map
mapper(tile = 'Esri.DeLorme', data = x %>% mutate(percent = poverty_gap), palette = 'Purples')# c) at the provincial level
x <- df %>%
group_by(prov) %>%
summarise(poverty_gap = weighted.mean(poverty_gap, w = hhsize))
# Table
x| prov | poverty_gap |
|---|---|
| bocas del toro | 1.3887111 |
| chiriquí | 1.8711549 |
| coclé | 1.3392145 |
| colón | 2.0718112 |
| darién | 1.1300094 |
| emberá | 0.6430642 |
| herrera | 1.8512499 |
| kuna yala | 0.5552103 |
| los santos | 2.0322630 |
| ngöbe bugle | 0.2640200 |
| panamá | 2.5292465 |
| veraguas | 1.2718633 |
# Chart
ggplot(data = x,
aes(x = prov,
y = poverty_gap)) +
geom_bar(stat = 'identity',
fill = 'darkorange',
alpha = 0.6) +
geom_label(aes(label = paste0(round(poverty_gap, digits = 2), '%'))) +
theme_databrew() +
labs(x = 'Province',
y = 'Poverty gap',
title = 'Poverty gap index by province') +
theme(axis.text.x = element_text(angle = 90))# Map
mapper(tile = 'Stamen.Watercolor', data = x %>% mutate(percent = poverty_gap), palette = 'Greens')# a) at the national level
weighted.mean(df$poverty_gap, w = df$hhsize) ^2[1] 3.398376
# b) at the area level (rural, urban, indigenous)
x <- df %>%
group_by(area) %>%
summarise(poverty_gap = weighted.mean(poverty_gap, w = hhsize)^2)
# Table
x| area | poverty_gap |
|---|---|
| urbana | 6.8800978 |
| rural | 1.6311848 |
| indígena | 0.1297871 |
# Chart
ggplot(data = x,
aes(x = area,
y = poverty_gap)) +
geom_bar(stat = 'identity',
fill = 'darkorange',
alpha = 0.6) +
geom_label(aes(label = paste0(round(poverty_gap, digits = 2), '%'))) +
theme_databrew() +
labs(x = 'Area',
y = 'Squared poverty gap',
title = 'Squared poverty gap index by area')# Map
mapper(tile = 'Esri.WorldImagery', data = x %>% mutate(percent = poverty_gap), palette = 'Greens')# c) at the provincial level
x <- df %>%
group_by(prov) %>%
summarise(poverty_gap = weighted.mean(poverty_gap, w = hhsize)^2)
# Table
x| prov | poverty_gap |
|---|---|
| bocas del toro | 1.9285186 |
| chiriquí | 3.5012205 |
| coclé | 1.7934954 |
| colón | 4.2924016 |
| darién | 1.2769212 |
| emberá | 0.4135316 |
| herrera | 3.4271263 |
| kuna yala | 0.3082585 |
| los santos | 4.1300930 |
| ngöbe bugle | 0.0697065 |
| panamá | 6.3970880 |
| veraguas | 1.6176362 |
# Chart
ggplot(data = x,
aes(x = prov,
y = poverty_gap)) +
geom_bar(stat = 'identity',
fill = 'darkorange',
alpha = 0.6) +
geom_label(aes(label = paste0(round(poverty_gap, digits = 2), '%'))) +
theme_databrew() +
labs(x = 'Province',
y = 'Squared poverty gap',
title = 'Squared poverty gap index by province') +
theme(axis.text.x = element_text(angle = 90))# Map
mapper(tile = 'Esri.WorldPhysical', data = x %>% mutate(percent = poverty_gap), palette = 'Oranges')Report the precision of your poverty estimates. (How can you take into account that the sample is ‘clustered’: clusters have been sampled randomly, and within clusters households have been sampled randomly?)
Given the clustering, we should use robust standard errors.
Using the data file on Panama calculate the between and with-in group contribution to per capita consumption inequality from the following population groups (based on the General Entropy inequality measures with parameter values 0, 1, and 2).
Note to professor: I use the IC2 package for these decomposition calculations (Plat 2012).
library(IC2)
df$prov <- factor(df$prov)
for (i in 0:2){
x <- decompAtkinson(x = df$cons_pc,
z = df$prov,
w = df$hhsize,
epsilon = i,
decomp = "BDA",
ELMO = TRUE)
cat(paste0('#### Epsilon: ', i, ' -----------------------------------------------------------\n'))
summary(x)
}#### Epsilon: 0 -----------------------------------------------------------
Atk epsilon
1.1102e-16 0.0000e+00
Decomposition (BDA):
within between cross
-2.2204e-16 0.0000e+00 0.0000e+00
betweenELMO
1.1102e-16
#### Epsilon: 1 -----------------------------------------------------------
Atk epsilon
0.36133 1.00000
Decomposition (BDA):
within between cross
0.278419 0.114904 0.031991
betweenELMO
0.31409
#### Epsilon: 2 -----------------------------------------------------------
Atk epsilon
0.62464 2.00000
Decomposition (BDA):
within between cross
0.45453 0.31185 0.14175
betweenELMO
0.58025
df$area <- factor(df$area)
for (i in 0:2){
x <- decompAtkinson(x = df$cons_pc,
z = df$area,
w = df$hhsize,
epsilon = i,
decomp = "BDA",
ELMO = TRUE)
cat(paste0('#### Epsilon: ', i, ' -----------------------------------------------------------\n'))
summary(x)
}#### Epsilon: 0 -----------------------------------------------------------
Atk epsilon
1.1102e-16 0.0000e+00
Decomposition (BDA):
within between cross
-2.2204e-16 0.0000e+00 0.0000e+00
betweenELMO
1.1102e-16
#### Epsilon: 1 -----------------------------------------------------------
Atk epsilon
0.36133 1.00000
Decomposition (BDA):
within between cross
0.257313 0.140057 0.036038
betweenELMO
0.27511
#### Epsilon: 2 -----------------------------------------------------------
Atk epsilon
0.62464 2.00000
Decomposition (BDA):
within between cross
0.44212 0.32715 0.14464
betweenELMO
0.53336
df$family_size <- cut(df$hhsize, c(0, 3, 6, 10, 30))
df$family_size <- factor(df$family_size)
for (i in 0:2){
x <- decompAtkinson(x = df$cons_pc,
z = df$family_size,
w = df$hhsize,
epsilon = i,
decomp = "BDA",
ELMO = TRUE)
cat(paste0('#### Epsilon: ', i, ' -----------------------------------------------------------\n'))
summary(x)
}#### Epsilon: 0 -----------------------------------------------------------
Atk epsilon
1.1102e-16 0.0000e+00
Decomposition (BDA):
within between cross
0.0000e+00 1.1102e-16 0.0000e+00
betweenELMO
0
#### Epsilon: 1 -----------------------------------------------------------
Atk epsilon
0.36133 1.00000
Decomposition (BDA):
within between cross
0.271615 0.123171 0.033455
betweenELMO
0.31104
#### Epsilon: 2 -----------------------------------------------------------
Atk epsilon
0.62464 2.00000
Decomposition (BDA):
within between cross
0.49047 0.26331 0.12915
betweenELMO
0.56481
The assignment is to be turned in by 17:00 on Wednesday, November 22nd. Make sure to include your name. The assignment is to be sent by email to: p.f.lanjouw@vu.nl
The below is the data dictionary, as provided in the assignment description.
| variable_name | variable_label |
|---|---|
| Psu | primary sampling unit |
| Household | household id |
| Questionnaire | questionnaire number |
| Prov | Province |
| District | District |
| Region | Region |
| Area | area (urban/rural/indigenous) |
| Consumption | total household consumption in prices june 08 |
| cons_pc | consumption per capita in prices june 08 |
| Hhsize | household size |
| Xpovline | extreme poverty line |
| Povline | poverty line |
| Indweight | individual weight |
| Hhweight | household weight |
David, Steele Robert. 2012. The Open-Source Everything Manifesto. North Atlantic Books. http://realitysandwich.com/151036/open_source_everything_manifesto/.
Plat, Didier. 2012. IC2: Inequality and Concentration Indices and Curves. https://CRAN.R-project.org/package=IC2.